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RECEIVED 

JUL 1 9 2001 

The invention relates to motion-compensated predictive image encoding and 

decoding. 

5 As set out in more detail in Sections 1-3 of the first priority application, 

motion-compensated predictive image encoding and decoding is well known in the an, see 
References [l]-[4]. A high-quality 3-Dimensional Recursive Search block matching 
algorithm, also described in the first priority application, is known from References [5]-[7]. 

As set out in the first priority application, a first motion-compensated predictive 

10 image encoding technique (the H.263 standard) is known in which motion vectors are 
estimated and used for 16*16 macro-blocks. This large macro-block size results in a 
relatively low number of bits for transmitting the motion data. On the other hand, the 
motion-compensation is rather coarse. In an extension of the H.263 standard, motion vectors 
are used and transmitted for smaller 8*8 blocks: more motion data, but a less coarse motion- 

15 compensation. However, the higher number of bits required for motion data results in that 
fewer bits are available for transmitting image data, so that the overall improvement on 
image quality is less than desired. 

20 It is, inter alia, an object of the invention to provide improved motion- 

compensated predictive image encoding and decoding techniques. To this end, a first aspect 
of the invention provides an image encoding method and device as defined in claims 1 and 3. 
A second aspect of the invention provides an image decoding method and device as defined 
in claims 4 and 6. Further aspects of the invention provide a multi-media apparatus (claim 

25 7), an image signal display apparatus (claim 8), and an image signal (claim 9). Advantageous 
embodiments are defined in dependent claims 2 and 5. 

In a method of motion-compensated predictive image encoding in accordance 
with a primary aspect of the present invention, first motion vectors are estimated for first 
objects, the first motion vectors are filtered to obtain second motion vectors for second 
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objects, the second objects being smaller than the first objects, prediction errors are 
generated in dependence on the second motion vectors, and the first motion vectors and the 
prediction errors are combined. 

These and other aspects of the invention will be apparent from and elucidated 
with reference to the embodiments described hereinafter. 


In the drawings: 

Fig. 1 shows a basic DPCM/DCT video compression block diagram in 
10 accordance with the present invention; 

Fig. 2 shows a temporal prediction unit having a motion vector post-filter 
(MVPF) in accordance with the present invention; 

Fig. 3 illustrates block erosion from one vector per 16*16 macro-block to one 
vector for every 8*8 block; 
15 Fig. 4 shows a decoder block diagram in accordance with the present invention; 

and 

Fig. 5 shows a image signal reception device in accordance with the present 

invention. 

20 

In the image encoder of Fig. 1, an input video signal IV is applied to a frame 
skipping unit 1 . An output of the frame skipping unit 1 is connected to a non-inverting input 
of a subtracter 3 and to a first input of a change-over switch 7. The output of the frame 
skipping unit 1 further supplies a current image signal to a temporal prediction unit 5. An 

25 inverting input of the subtracter 3 is connected to an output of the temporal prediction unit 5 . 
A second input of the change-over switch 7 is connected to an output of the subtracter 3. An 
output of the change-over switch 7 is connected to a cascade arrangement of a Discrete 
Cosine Transformation encoder DCT and a quantizing unit Q. An output of the quantizing 
unit Q is connected to an input of a variable length encoder VLC, an output of which is 

30 connected to a buffer unit BUF that supplies an output bit-stream OB. 

The output of the quantizing unit Q is also connected to a cascade arrangement 
of a de-quantizing unit Q 1 and a DCT decoder DCT" 1 . An output of the DCT decoder DCT' 1 
is coupled to a first input of an adder 9, a second input of which is coupled to the output of 
the temporal prediction unit 5 thru a switch 11. An output of the adder 9 supplies a 
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reconstructed previous image to the temporal prediction unit 5. The temporal prediction unit 
5 calculates motion vectors MV which are also encoded by the variable length encoder VLC. 

The buffer unit BUF supplies a control signal to the quantizing unit Q, and to a 
coding selection unit 13 which supplies an Intra- frame / Predictive encoding control signal 
5 I/P to the switches 7 and 11. If intra-frame encoding is carried out, the switches 7, 11 are in 
the positions shown in Fig. 1. 

In accordance with the present invention, the image encoder of Fig. 1 is 
characterized by the special construction of the temporal prediction unit 5 which will be 
described in more detail by means of Fig. 2. 

10 

As shown in Fig. 2, the temporal prediction unit 5 includes a motion estimator 
ME and a motion-compensated interpolator MCI which both receive the current image from 
the frame skipping unit 1 and the reconstructed previous image from the adder 9. In 
accordance with the present invention, the motion vectors MV calculated by the motion 
15 estimator ME are filtered by a motion vector post-filter MVPF before being applied to the 
motion-compensated interpolator MCI. 

In this Section we will describe the real innovative part of our proposal, the 
motion vector post-filtering (MVPF). Preferably, we want to use the overlapped block 
motion-compensation based on blocks of size 8*8, as it is actually specified in the Advanced 
20 Prediction Mode (APM) of the H.263 standard (described in more detail in the first priority 
application), in both the encoding and decoding terminals, while transmitting and receiving 
only macro-block (MB) motion vectors estimated for 16*16 macro-blocks to not increase the 
bit-rate. This means that both terminals have to use the same MVPF, to re-assigri the MB 
vectors to blocks of 8*8 pixels, as performed in the motion estimation part of APM. Fig. 2 
25 shows the temporal prediction unit 5 including the MVPF. 

' Even if the MVPF should not depend on the estimation strategy, we strongly 
recommend to use it jointly with the motion estimator described in References [5]-[7], to 
obtain the best performances. Of course, there are several solutions to calculate the 8*8 
block vectors, for example by a weighted averaging of the adjacent 16*16 macro-block 
30 vectors, anyway we will describe in detail only what we consider the best solution, due to 
the inherent features of our new motion estimator, the block erosion MVPF. 

As reported in References [l]-[4], in the H.263 standard the motion information 
is limited to one vector per macro-block of X*Y = 16*16 pixels. Therefore, in accordance 
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with a preferred embodiment, the MVPF performs a block erosion to eliminate fixed block 
boundaries from the vector field, by re-assigning a new vector to a block of sizes 
(X/2)*(Y/2) = 8*8. 


5 If MVc = d(b c1 t) is a macro-block vector centered in b z and its four adjacent 

macro-block vectors are given by: 

MVZ = d(b c - 0 
MVr = d(b c - 0 
MVa = d(b c - (°), r) 

10 MVb = ~d(b c - t) 

the four 8*8 blocks, numbered as in Fig. 3, will be assigned their new vectors according to 
the following: 

MV1 = median(MVl, MVc, MVa) 
MV2 = median(MVa, MVc, MVr) 
15 MV3 = median(MVl, MVc, MVb) 
MV4 = median(MVr, MVc, MVb) 

More specifically, the filtering step MVPF comprises the steps of: 
providing x and y motion vector components of a given macro-block MVc and 
20 of macro-blocks MV1, MVr, MVa, MVb adjacent to the given macro-block MVc; and 

supplying for each block MV1 of a number of blocks MV1-MV4 corresponding 
to the given macro-block MVc, x and y motion vector components respectively selected from 
the x and y motion vector components of the given macro-block MVc and from the x and y 
motion vector components of two blocks MV1, MVa adjacent to the block MV1. 

25 

Fig. 3 shows the block erosion of a macro-block vector MVc for a 16*16 
macro-block into four block vectors MV1, MV2, MV3, MV4 for 8*8 blocks. Block erosion 
as such for use in a field-rate converter in a television receiver is known from 
US-A-5, 148,269 (Attorneys' docket PHN 13,396). That patent does not suggest that block 
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erosion can. advantageously be used to transmit motion vectors estimated for macro-blocks, 
while a four times larger number of vectors is used in both the encoder and the decoder to 
obtain prediction errors for blocks which are four times smaller than the macro-blocks. 

5 This solution has not been mentioned in the H.263 standard, but it is fully 

H.263 compatible. At the start of the multi-media communication the two terminals exchange 
data about their processing standard and non-standard capabilities (see Reference [4] for 
more details). If we assume that, during the communication set-up, both terminals declare 
this MVPF capability, they will easily interface with each other. Hence, the video encoder 

10 will transmit only MB vectors for 16*16 macro-blocks, while the video decoder will 
post-filter them in order to have a different vector for every 8*8 block. In the temporal 
interpolation process both terminals use the overlapped block motion compensation, as it is 
specified in the H.263 APM. Thanks to this method, we can achieve the same image quality 
as if the APM was used, but without increasing the bit-rate. 

15 If at least one terminal declares to have not this capability, a flag can be forced 

in the other terminal to switch it off. 

Fig. 4 shows a decoder in accordance with the present invention. An incoming 
bit-stream is applied to a buffer BUFF having an output which is coupled to an input of a 

20 variable length decoder VLC 1 . The variable length decoder VLC 1 supplies image data to a 
cascade arrangement of an inverse quantizer Q" 1 and a DCT decoder DCT 1 . An output of the 
DCT decoder DCT* 1 is coupled to a first input of an adder 15, an output of which supplies 
the output signal of the decoder. The variable length decoder VLC' 1 further supplies motion 
vectors MV for 16*16 macro-blocks to a motion vector post- filter MVPF to obtain motion 

25 vectors for 8*8 blocks. These latter motion vectors are applied to a motion-compensation unit 
MC which receives the output signal of the decoder. An output signal of the motion- 
compensation unit MC is applied to a second input of the adder 15 thru a switch 17 which is 
controlled by an Intra-frame / Predictive encoding control signal I/P from the variable length 
decoder VLC 1 ".' 

30 

Fig. 5 shows a image signal reception device in accordance with the present 
invention. Parts (T, Fig. 4, VSP) of this device may be part of a multi-media apparatus. A 
satellite, dish SD receives a motion-compensated predictively encoded image signal in 
accordance with the present invention. The received signal is applied to a tuner T, the output 
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signal of which is applied to the decoder of Fig. 4. The decoded output signal of the decoder 
of Fig. 4 is subjected to normal video signal processing operations VSP, the result of which 
is displayed on a display D. 

5 It is interesting to note that in one example (described in more detail in the first 

priority application), the motion vectors (macro-block information) need from 13-18% of the 
total bit-rate in the basic H.263 standard, and 19-25 % in the H.263 standard with APM and 
UMV. UMV means Unrestricted Motion Vectors and is described in more detail in the first 
priority application. Basically, UMV means that the search range is quadrupled from [-16, 

10 +15.5] to [-31.5, +31.5]. 

Thanks to our method, we can use the difference between these amounts of bits 
for relaxing the DCT coefficients quantization instead of encoding the motion vectors 
information related to blocks, so that we achieve higher sharpness pictures than actual H.263 
standard image encoders with APM, without increasing the bit-rates. 

15 On the other hand, if the DCT coefficients quantization is not relaxed, we can 

encode and transmit "typical H.263 plus APM quality" pictures, while reducing the bit-rate 
because of no block motion information transmission, thus increasing the channel efficiency. 

Finally, in our method every block will be assigned its own motion vectors, 
while in the APM of H.263 standard not all the macro-blocks will be processed as four 

20 separate blocks. In other words, in APM is always possible that there will remain a 

consistent number of macro-blocks to which a motion vector is assigned, while our method 
always assigns one proper motion vector to every block. 

A primary aspect of the invention can be summarized as follows. The invention 
25 relates to a low bit-rate video coding method fully compatible with H.263 standard and 

comprising a Motion Vector Post-Filtering (MVPF) step. This MVPF step assigns a different 
motion vector to every block composing a macro-block, starting from the original motion 
vector of the macro-block itself. In this way the temporal prediction is based on 8*8 pixels 
blocks instead of 16*16 macro-blocks, as actually is done when the negotiable option called 
30 Advanced Prediction Mode (APM) is used in the H.263 encoder. The video decoding 
terminal has to use the same MVPF step to produce the related block vectors. 

Furthermore, since only macro-block vectors are differentially encoded (in a 
variable length fashion) and transmitted, a considerable bit-rate reduction is also achieved, in 
comparison with APM. 


i 
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This method is not yet H.263 standardized, so it has to be signalled between the 
two terminals, via the H.245 protocol. It can be used at CIF, QCIF and SQCIF resolution. 

The following salient features of the invention are noteworthy. 
5 A method and an apparatus realizing the method, for H.263 low bit-rate video 

encoding and decoding stages, which inherently performs the same topics of the so called 
APM in terms of motion estimation and motion compensation based on 8*8 pixels blocks 
instead of 16*16 macro-blocks, as actually done only in H.263 encoders and decoders that 
use the APM. 

10 A method and an apparatus realizing the method which further includes a 

MVPF step placed in the motion estimation stage of the. temporal prediction loop of the 
H.263 video encoder. 

A method and an apparatus realizing the method which further includes a 
MVPF step placed in the temporal interpolation stage of the H.263 video decoder. 

15 A method and an apparatus realizing the method which achieves the same (or 

even a superior) image quality of the APM, since the temporal prediction is based on 8*8 
pixels blocks instead of 16*16 macro-blocks. 

A method and an apparatus realizing the method which achieves a lower bit-rate 

V in comparison with APM, since only macro-block vectors are differential encoded and 

20 transmitted. The image quality is similar to the H.263 standard with APM. 

A method and an apparatus realizing the method which achieves a superior 
image quality than the H.263 standard with APM, since the bit-budget saved by encoding 
and transmitting only macro-block vectors is re-used for a less coarse quantization of DCT 
coefficients. The bit-rates are similar to ones achievable from the H.263 standard with APM. 

25 A method and an apparatus realizing the method where the MVPF is a block 

erosion stage, when the motion estimation is calculated on macro-blocks of H.263 standard 
dimensions (16*16 pixels). Anyway any other solution can be applied, such as a weighted 
averaging of adjacent macro-block vectors. 

30 It should be noted that the above-mentioned embodiments illustrate rather than 

limit the invention, and that those skilled in the art will be able to design many alternative 
embodiments without departing from the scope of the appended claims. In the claims, any 
reference signs placed between parentheses shall not be construed as limiting the claim. The 
invention can be implemented by means of hardware comprising several distinct elements, 


fttfS 10. /0Z 



8 23.10.1998 

and by means of a suitably programmed computer. In the device claim enumerating several 
means, several of these means can be embodied by one and the same item of hardware. 
While in a preferred embodiment, 16*16 macro-blocks are reduced to 8*8 blocks, a further 
reduction to quarter-blocks of size 4*4 is also possible, in which case the predictive encoding 
5 is based on the 4*4 quarter-blocks. 
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